Data Description:

Data Description: The IPL is professional cricket t20 league championship started in India in 2008. It was initiated by BCCI with 8 franchises comprising players across the world. The first IPL auction was held in 2008 for ownership of the teams for 10 years, with base price of USD 50 Billion. The franchises acquire players through an auction that is conducted every year. However , there are several rules imposed by the IPL. The performance of the players could be measured many ways. Although the IPL follows twenty20 format of the game, it is possible that the performance of the players in the other format of the game such as test and ODI matches can influence the player pricing The data set consist of performance 0f 130 players measured through various performance measure metrics such as batting , bowling etc.

Problem statement

coefficient to understand what features of players are influencing their SOLD PRICE

Importing the important Libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, auc, roc_curve
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix,accuracy_score,classification_report
import re
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error,mean_absolute_error,r2_score
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression,Ridge,Lasso
from sklearn.metrics import mean_squared_error,mean_absolute_error,r2_score
from statsmodels.formula.api import ols
from sklearn.tree import DecisionTreeRegressor
In [2]:
dt=pd.read_csv('IPL IMB381IPL2013.csv')
In [3]:
dt.head()
Out[3]:
Sl.NO. PLAYER NAME AGE COUNTRY TEAM PLAYING ROLE T-RUNS T-WKTS ODI-RUNS-S ODI-SR-B ... SR-B SIXERS RUNS-C WKTS AVE-BL ECON SR-BL AUCTION YEAR BASE PRICE SOLD PRICE
0 1 Abdulla, YA 2 SA KXIP Allrounder 0 0 0 0.00 ... 0.00 0 307 15 20.47 8.90 13.93 2009 50000 50000
1 2 Abdur Razzak 2 BAN RCB Bowler 214 18 657 71.41 ... 0.00 0 29 0 0.00 14.50 0.00 2008 50000 50000
2 3 Agarkar, AB 2 IND KKR Bowler 571 58 1269 80.62 ... 121.01 5 1059 29 36.52 8.81 24.90 2008 200000 350000
3 4 Ashwin, R 1 IND CSK Bowler 284 31 241 84.56 ... 76.32 0 1125 49 22.96 6.23 22.14 2011 100000 850000
4 5 Badrinath, S 2 IND CSK Batsman 63 0 79 45.93 ... 120.71 28 0 0 0.00 0.00 0.00 2011 100000 800000

5 rows × 26 columns

In [4]:
dt.columns
Out[4]:
Index(['Sl.NO.', 'PLAYER NAME', 'AGE', 'COUNTRY', 'TEAM', 'PLAYING ROLE',
       'T-RUNS', 'T-WKTS', 'ODI-RUNS-S', 'ODI-SR-B', 'ODI-WKTS', 'ODI-SR-BL',
       'CAPTAINCY EXP', 'RUNS-S', 'HS', 'AVE', 'SR-B', 'SIXERS', 'RUNS-C',
       'WKTS', 'AVE-BL', 'ECON', 'SR-BL', 'AUCTION YEAR', 'BASE PRICE',
       'SOLD PRICE'],
      dtype='object')
In [5]:
dt.dtypes
Out[5]:
Sl.NO.             int64
PLAYER NAME       object
AGE                int64
COUNTRY           object
TEAM              object
PLAYING ROLE      object
T-RUNS             int64
T-WKTS             int64
ODI-RUNS-S         int64
ODI-SR-B         float64
ODI-WKTS           int64
ODI-SR-BL        float64
CAPTAINCY EXP      int64
RUNS-S             int64
HS                 int64
AVE              float64
SR-B             float64
SIXERS             int64
RUNS-C             int64
WKTS               int64
AVE-BL           float64
ECON             float64
SR-BL            float64
AUCTION YEAR       int64
BASE PRICE         int64
SOLD PRICE         int64
dtype: object
In [6]:
dt.isnull().sum()
Out[6]:
Sl.NO.           0
PLAYER NAME      0
AGE              0
COUNTRY          0
TEAM             0
PLAYING ROLE     0
T-RUNS           0
T-WKTS           0
ODI-RUNS-S       0
ODI-SR-B         0
ODI-WKTS         0
ODI-SR-BL        0
CAPTAINCY EXP    0
RUNS-S           0
HS               0
AVE              0
SR-B             0
SIXERS           0
RUNS-C           0
WKTS             0
AVE-BL           0
ECON             0
SR-BL            0
AUCTION YEAR     0
BASE PRICE       0
SOLD PRICE       0
dtype: int64
In [7]:
dt.shape
Out[7]:
(130, 26)
In [8]:
dt['AUCTION YEAR'].unique()
Out[8]:
array([2009, 2008, 2011, 2010], dtype=int64)
  • In this dataset we have both batsman,bolwer,allrounder & Weeket keeper are present. Lets divide the payer to different category and analyse date
In [36]:
dt_Allrounder=dt[dt['PLAYING ROLE']=='Allrounder']
In [37]:
dt_Allrounder=dt_Allrounder.sort_values('SOLD PRICE',axis=0,ascending=False)
In [38]:
dt_Allrounder['PLAYING ROLE'].unique()
Out[38]:
array(['Allrounder'], dtype=object)
In [39]:
dt_Bowler=dt[dt['PLAYING ROLE']=='Bowler']
In [40]:
dt_Bowler=dt_Bowler.sort_values('SOLD PRICE',axis=0,ascending=False)
In [41]:
dt_Batsman=dt[dt['PLAYING ROLE']=='Batsman']
In [42]:
dt_Batsman=dt_Batsman.sort_values('SOLD PRICE',axis=0,ascending=False)
In [43]:
dt_WKeeper=dt[dt['PLAYING ROLE']=='W. Keeper']
In [44]:
dt_WKeeper=dt_WKeeper.sort_values('SOLD PRICE',axis=0,ascending=False)
In [62]:
dt['TEAM'].unique()
Out[62]:
array(['KXIP', 'RCB', 'KKR', 'CSK', 'RR', 'MI', 'DD', 'DC', 'KX', 'KXI+'],
      dtype=object)
In [63]:
dt['TEAM']=dt['TEAM'].replace(['CSK+','RCB+','MI+','DD+','KKR+','DC+','RR+','KXIP+'],
                              ['CSK','RCB','MI','DD','KKR','DC','RR','KXI'])
In [64]:
dt['TEAM']=dt['TEAM'].replace(['KXI'],['KX'])
  • I have divided the dataset into 4 parts.Now we have different dataset for batsman, bolwer,allrounder,wicket keeper
In [21]:
data = pd.DataFrame(dt.head(30).groupby(['TEAM', 'AUCTION YEAR'])['SOLD PRICE'].sum()).reset_index()
#fig = px.line(data, x='Year', y='Amount', color='Team', symbol='Team')
#fig = px.bar(data, x='AUCTION YEAR', y='SOLD PRICE', color='TEAM')#, symbol='Team')
fig = px.bar(data, x='TEAM', y='SOLD PRICE', color='AUCTION YEAR')#, symbol='Team')
fig.update_layout(title_text = "Overall comparision of different team spends")
fig.show()
fig = px.line(data, x='AUCTION YEAR', y='SOLD PRICE', color='TEAM', symbol='TEAM')
#fig = px.bar(data, x='Year', y='Amount', color='Team')#, symbol='Team')
fig.update_layout(title_text = "Overall comparision of different team spends")
fig.show()

Observation

  • From the above graph we can clearly see CSK has made highest spend to build their team for last 3 years
In [45]:
data = pd.DataFrame(dt_Batsman.head(30).groupby(['TEAM', 'PLAYER NAME'])['SOLD PRICE'].sum()).reset_index()
#fig = px.line(data, x='Year', y='Amount', color='Team', symbol='Team')
fig = px.bar(data, x='PLAYER NAME', y='SOLD PRICE', color='TEAM')#, symbol='Team')
fig.update_layout(title_text = "Overall comparision of different team spends on Batsman")
fig.show()

Observation

  • From the above graph we can clearly see that Virat,Sachin,Yuvraj & Sehwag got the highest bidding amount
In [23]:
fig =px.bar(dt_Batsman.head(30), y="PLAYER NAME", x="SOLD PRICE", text='AVE',color='T-RUNS',orientation='h', height=600)
fig.update_layout(title_text = "Top-30 Batsman Auction Buys w.r.t Total runs and avarege")
fig.show()

Observation

  • From the above graph we can clearly see that Virat,Sachin,Yuvraj & Sehwag got the highest bidding amount
  • Players with better average and more runs gets sold with high price
In [24]:
fig =px.bar(dt_Batsman.head(30), y="PLAYER NAME", x="SOLD PRICE", text='ODI-RUNS-S',color='ODI-SR-B',orientation='h', height=600)
fig.update_layout(title_text = "Top-30 Batsman Auction Buys w.r.t ODI runs and Str Rate")
fig.show()

Observation

  • In T-20 Cricket Strike rate matters. That's very much clear from the above graph
  • Players with better Strike rate and more runs gets sold with high price
In [25]:
fig =px.bar(dt_Batsman.head(30), y="PLAYER NAME", x="SIXERS", text='T-RUNS',color='SOLD PRICE',orientation='h', height=600)
fig.update_layout(title_text = "Top-30 Batsman Auction Buys w.r.t Total runs and Sixers")
fig.show()

Observation

  • From the graph it's clear that number 6 hit by the batsman doesn't affect the sold proce
  • We can clearlt see S.Raina has hit the most sixes but still he hasn't sold in high price
In [46]:
data = pd.DataFrame(dt_WKeeper.head(30).groupby(['TEAM', 'PLAYER NAME'])['SOLD PRICE'].sum()).reset_index()
#fig = px.line(data, x='Year', y='Amount', color='Team', symbol='Team')
fig = px.bar(data, x='PLAYER NAME', y='SOLD PRICE', color='TEAM')#, symbol='Team')
fig.update_layout(title_text = "Overall comparision of different team spends on Wicket Keeper")
fig.show()

Observation

  • We can clearly see that MS Dhoni is sold at the highest price for CSK
In [47]:
fig =px.bar(dt_WKeeper.head(30), y="PLAYER NAME", x="SOLD PRICE", text='AVE',color='T-RUNS',orientation='h', height=600)
fig.update_layout(title_text = "Top-30 Wk Auction Buys w.r.t Total runs and avarege")
fig.show()
In [48]:
fig =px.bar(dt_WKeeper.head(30), y="PLAYER NAME", x="ODI-SR-B", text='SOLD PRICE',color='SOLD PRICE',orientation='h', height=600)
fig.update_layout(title_text = "Top-30 Wk Auction Buys w.r.t ODI runs and Str Rate")
fig.show()

Observation

  • From the graph it's clear that players with more runs and good strike rate are sold at high price
In [49]:
fig =px.bar(dt_WKeeper.head(30), y="PLAYER NAME", x="SIXERS", text='T-RUNS',color='SOLD PRICE',orientation='h', height=600)
fig.update_layout(title_text = "Top-30 Wk Auction Buys w.r.t Total runs and Sixers")
fig.show()

Observation

  • From the graph it's clear that players with more runs and more number of 6's hit are sold at high price
In [50]:
data = pd.DataFrame(dt_Allrounder.head(30).groupby(['TEAM', 'PLAYER NAME'])['SOLD PRICE'].sum()).reset_index()
#fig = px.line(data, x='Year', y='Amount', color='Team', symbol='Team')
fig = px.bar(data, x='PLAYER NAME', y='SOLD PRICE', color='TEAM')#, symbol='Team')
fig.update_layout(title_text = "Overall comparision of different team spends on Allrounder")
fig.show()

Observation

  • From the graph it's clear CSK and DC have spend a lot of money on allrounders
In [51]:
fig =px.bar(dt_Allrounder.head(30), y="PLAYER NAME", x="SOLD PRICE", text='AVE',color='T-RUNS',orientation='h', height=600)
fig.update_layout(title_text = "Top-30 Allrounder Auction Buys w.r.t Total runs and avarege")
fig.show()

Observation

  • From the graph it's clear that Runs scoring has no effect on sold price for alrounder
In [52]:
fig =px.bar(dt_Allrounder.head(30), y="PLAYER NAME", x="SOLD PRICE", text='ODI-RUNS-S',color='ODI-SR-B',orientation='h', height=600)
fig.update_layout(title_text = "Top-30 Allrounder Auction Buys w.r.t ODI runs and Str Rate")
fig.show()

Observation

  • From the graph it's clear that Runs scored and good Strike rate has prompted teams to buy alrounder at a high price
In [55]:
fig =px.bar(dt_Allrounder.head(30), y="PLAYER NAME", x="SOLD PRICE", text='T-RUNS',color='T-RUNS',orientation='h', height=600)
fig.update_layout(title_text = "Top-30 Allrounder Auction Buys w.r.t Total runs and Sixers")
fig.show()

Observation

  • From the graph it's clear that Runs scored and 6's hit has no effect on sold price of the allrounders
In [56]:
fig =px.bar(dt_Allrounder.head(30), y="PLAYER NAME", x="SOLD PRICE", text='T-WKTS',color='SIXERS',orientation='h', height=600)
fig.update_layout(title_text = "Top-30 Allrounder Auction Buys w.r.t Total Wickets and Sixers")
fig.show()

Observation

  • From the graph it's clear that number of T-wickets has affteted the sold price of allrounders
In [57]:
fig =px.bar(dt_Bowler.head(30), y="PLAYER NAME", x="T-WKTS", text='SR-BL',color='SOLD PRICE',orientation='h', height=600)
fig.update_layout(title_text = "Top-30 Bowler Auction Buys w.r.t Total-WKTS runs and bowling Strikerate")
fig.show()

Observation

  • From the graph it's clear that number of T-wickets has affteted the sold price of bowlers
  • Bowlers with more wickets has got higher sold price
In [58]:
fig =px.bar(dt_Bowler.head(30), y="PLAYER NAME", x="ODI-WKTS", text='ODI-SR-BL',color='SOLD PRICE',orientation='h', height=600)
fig.update_layout(title_text = "Top-30 Bowler Auction Buys w.r.t ODI-WKTS runs and Strikerate")
fig.show()

Observation

  • From the graph it's clear that number of ODI-wickets has affteted the sold price of bowlers
  • bowlers with more wickets has got higher sold price
In [33]:
fig =px.bar(dt_Bowler.head(30), y="PLAYER NAME", x="ECON", text='SR-BL',color='SOLD PRICE',orientation='h', height=600)
fig.update_layout(title_text = "Top-30 Bowler Auction Buys w.r.t Economy and Strikerate")
fig.show()

Observation

  • From the graph it's clear that bowlers with lesser economy and higher strike rate has got higher sold price
In [59]:
fig =px.bar(dt_Bowler.head(30), y="PLAYER NAME", x="RUNS-C", text='T-WKTS',color='SOLD PRICE',orientation='h', height=600)
fig.update_layout(title_text = "Top-30 Bowler Auction Buys w.r.t T20-Runs conc and Wkts")
fig.show()

Observation

  • From the graph it's clear that bowlers with lesser runs conceded and got more wickets has got better sold price
In [35]:
fig =px.bar(dt_Allrounder.head(30), y="PLAYER NAME", x="T-WKTS", text='SR-BL',color='SOLD PRICE',orientation='h', height=600)
fig.update_layout(title_text = "Top-30 Allrounder Auction Buys w.r.t Total-WKTS runs and Strikerate")
fig.show()
In [36]:
fig =px.bar(dt_Allrounder.head(30), y="PLAYER NAME", x="ODI-WKTS", text='ODI-SR-BL',color='SOLD PRICE',orientation='h', height=600)
fig.update_layout(title_text = "Top-30 Allrounder Auction Buys w.r.t ODI-WKTS runs and Strikerate")
fig.show()
In [37]:
fig =px.bar(dt_Allrounder.head(30), y="PLAYER NAME", x="WKTS", text='AVE-BL',color='SOLD PRICE',orientation='h', height=600)
fig.update_layout(title_text = "Top-30 Allrounder Auction Buys w.r.t T20-WKTS runs and Average")
fig.show()
In [38]:
fig =px.bar(dt_Allrounder.head(30), y="PLAYER NAME", x="ECON", text='SR-BL',color='SOLD PRICE',orientation='h', height=600)
fig.update_layout(title_text = "Top-30 Allrounder Auction Buys w.r.t Economy and Strikerate")
fig.show()
In [39]:
fig =px.bar(dt_Bowler.head(30), y="PLAYER NAME", x="RUNS-C", text='WKTS',color='SOLD PRICE',orientation='h', height=600)
fig.update_layout(title_text = "Top-30 Allrounder Auction Buys w.r.t T20-Runs conc and Wkts")
fig.show()
In [65]:
teams = dt.TEAM.unique()

import plotly.graph_objects as go
for team in teams:
    data = dt[dt['TEAM'] == team].groupby('AUCTION YEAR')['SOLD PRICE'].sum().to_frame('SOLD PRICE').reset_index()
    #fig = px.line(data, x = 'AUCTION YEAR', y = 'SOLD PRICE', title = f"{team}'s spent amount", text=data.index)
    #fig.add_bar(x = data.index, y = 'Amount')
    #fig.update_traces(textposition="top right")
    #fig.show()
    fig = px.bar(data, x = 'AUCTION YEAR', y = 'SOLD PRICE', color='SOLD PRICE')#, symbol='Team')
    fig.update_layout(title_text = f"{team}'s spent amount")
    fig.show()

Observation

  • From the graph we can see how much each teams have spend over the years
In [66]:
teams = dt.COUNTRY.unique()

import plotly.graph_objects as go
for team in teams:
    data = dt[dt['COUNTRY'] == team].groupby('AUCTION YEAR')['SOLD PRICE'].sum().to_frame('SOLD PRICE').reset_index()
    #fig = px.line(data, x = 'AUCTION YEAR', y = 'SOLD PRICE', title = f"{team}'s spent amount", text=data.index)
    #fig.add_bar(x = data.index, y = 'Amount')
   # fig.update_traces(textposition="top right")
   # fig.show()
    fig = px.bar(data, x = 'AUCTION YEAR', y = 'SOLD PRICE', color='SOLD PRICE')#, symbol='Team')
    fig.update_layout(title_text = f"{team}'s spent amount")
    fig.show()

Observation

  • From the graph we can see how much each teams have spend over the years for each individual country
In [67]:
teams = dt['PLAYING ROLE'].unique()

import plotly.graph_objects as go
for team in teams:
    data = dt[dt['PLAYING ROLE'] == team].groupby('AUCTION YEAR')['SOLD PRICE'].sum().to_frame('SOLD PRICE').reset_index()
    #fig = px.line(data, x = 'AUCTION YEAR', y = 'SOLD PRICE', title = f"{team}'s spent amount", text=data.index)
    #fig.add_bar(x = data.index, y = 'Amount')
    #fig.update_traces(textposition="top right")
   # fig.show()
    fig = px.bar(data, x = 'AUCTION YEAR', y = 'SOLD PRICE', color='AUCTION YEAR')#, symbol='Team')
    fig.update_layout(title_text = f"{team}'s spent amount")
    fig.show()

Observation

  • From the graph we can see how much each teams have spend over the years on bowlers batsman allrounder and wicket keeper
In [68]:
sort_vals = dt.sort_values(by='SOLD PRICE',ascending=False) #.head(50)
fig = px.bar(sort_vals, x="TEAM", y="SOLD PRICE", 
             color="PLAYING ROLE", barmode = 'group',text='AUCTION YEAR')
  
fig.show()

Observation

  • From the graph we can see how much each teams have spend over the years
In [ ]: